127 research outputs found

    Psychometric scaling of TID2013 dataset

    Get PDF
    TID2013 is a subjective image quality assessment dataset with a wide range of distortion types and over 3000 images. The dataset has proven to be a challenging test for objective quality metrics. The dataset mean opinion scores were obtained by collecting pairwise comparison judgments using the Swiss tournament system, and averaging votes of observers. However, this approach differs from the usual analysis of multiple pairwise comparisons, which involves psychometric scaling of the comparison data using either Thurstone or Bradley-Terry mod- els. In this paper we investigate how quality scores change when they are computed using such psychometric scaling instead of averaging vote counts. In order to properly scale TID2013 quality scores, we conduct four additional experiments of two different types, which we found necessary to produce a common quality scale: comparisons with reference images, and cross-content comparisons. We demonstrate on a fifth validation experiment that the two additional types of comparisons are necessary and in conjunction with psychometric scaling improve the consistency of quality scores, especially across images depicting different contents

    Dataset and metrics for predicting local visible differences

    Get PDF
    A large number of imaging and computer graphics applications require localized information on the visibility of image distortions. Existing image quality metrics are not suitable for this task as they provide a single quality value per image. Existing visibility metrics produce visual difference maps, and are specifically designed for detecting just noticeable distortions but their predictions are often inaccurate. In this work, we argue that the key reason for this problem is the lack of large image collections with a good coverage of possible distortions that occur in different applications. To address the problem, we collect an extensive dataset of reference and distorted image pairs together with user markings indicating whether distortions are visible or not. We propose a statistical model that is designed for the meaningful interpretation of such data, which is affected by visual search and imprecision of manual marking. We use our dataset for training existing metrics and we demonstrate that their performance significantly improves. We show that our dataset with the proposed statistical model can be used to train a new CNN-based metric, which outperforms the existing solutions. We demonstrate the utility of such a metric in visually lossless JPEG compression, super-resolution and watermarking.</jats:p

    A benchmark of light field view interpolation methods

    Get PDF
    Light field view interpolation provides a solution that reduces the prohibitive size of a dense light field. This paper examines state-ofthe-art light field view interpolation methods with a comprehensive benchmark on challenging scenarios specific for interpolation tasks. Each method is analyzed in terms of their strengths and weaknesses in handling different challenges. We find that large disparities in a scene are the main source of challenge for the light field view interpolation methods. We also find that a basic backward warping based on the depth estimation from optical flow provides comparable performance against usually complex learning-based methods

    Exploiting synthetically generated data with semi-supervised learning for small and imbalanced datasets

    Get PDF
    Data augmentation is rapidly gaining attention in machine learning. Synthetic data can be generated by simple transformations or through the data distribution. In the latter case, the main challenge is to estimate the label associated to new synthetic patterns. This paper studies the effect of generating synthetic data by convex combination of patterns and the use of these as unsupervised information in a semi-supervised learning framework with support vector machines, avoiding thus the need to label synthetic examples. We perform experiments on a total of 53 binary classification datasets. Our results show that this type of data over-sampling supports the well-known cluster assumption in semi-supervised learning, showing outstanding results for small high-dimensional datasets and imbalanced learning problems

    Learning Foveated Reconstruction to Preserve Perceived Image Statistics

    Get PDF
    Foveated image reconstruction recovers full image from a sparse set of samples distributed according to the human visual system's retinal sensitivity that rapidly drops with eccentricity. Recently, the use of Generative Adversarial Networks was shown to be a promising solution for such a task as they can successfully hallucinate missing image information. Like for other supervised learning approaches, also for this one, the definition of the loss function and training strategy heavily influences the output quality. In this work, we pose the question of how to efficiently guide the training of foveated reconstruction techniques such that they are fully aware of the human visual system's capabilities and limitations, and therefore, reconstruct visually important image features. Due to the nature of GAN-based solutions, we concentrate on the human's sensitivity to hallucination for different input sample densities. We present new psychophysical experiments, a dataset, and a procedure for training foveated image reconstruction. The strategy provides flexibility to the generator network by penalizing only perceptually important deviations in the output. As a result, the method aims to preserve perceived image statistics rather than natural image statistics. We evaluate our strategy and compare it to alternative solutions using a newly trained objective metric and user experiments

    Optimizing vision and visuals: lectures on cameras, displays and perception

    Get PDF
    The evolution of the internet is underway, where immersive virtual 3D environments (commonly known as metaverse or telelife) will replace flat 2D interfaces. Crucial ingredients in this transformation are next-generation displays and cameras representing genuinely 3D visuals while meeting the human visual system's perceptual requirements. This course will provide a fast-paced introduction to optimization methods for next-generation interfaces geared towards immersive virtual 3D environments. Firstly, we will introduce lensless cameras for high dimensional compressive sensing (e.g., single exposure capture to a video or one-shot 3D). Our audience will learn to process images from a lensless camera at the end. Secondly, we introduce holographic displays as a potential candidate for next-generation displays. By the end of this course, you will learn to create your 3D images that can be viewed using a standard holographic display. Lastly, we will introduce perceptual guidance that could be an integral part of the optimization routines of displays and cameras. Our audience will gather experience in integrating perception to display and camera optimizations. This course targets a wide range of audiences, from domain experts to newcomers. To do so, examples from this course will be based on our in-house toolkit to be replicable for future use. The course material will provide example codes and a broad survey with crucial information on cameras, displays and perception

    Objective and subjective evaluation of High Dynamic Range video compression

    Get PDF
    A number of High Dynamic Range (HDR) video compression algorithms proposed to date have either been developed in isolation or only-partially compared with each other. Previous evaluations were conducted using quality assessment error metrics, which for the most part were developed for qualitative assessment of Low Dynamic Range (LDR) videos. This paper presents a comprehensive objective and subjective evaluation conducted with six published HDR video compression algorithms. The objective evaluation was undertaken on a large set of 39 HDR video sequences using seven numerical error metrics namely: PSNR, logPSNR, puPSNR, puSSIM, Weber MSE, HDR-VDP and HDR-VQM. The subjective evaluation involved six short-listed sequences and two ranking-based subjective experiments with hidden reference at two different output bitrates with 32 participants each, who were tasked to rank distorted HDR video footage compared to an uncompressed version of the same footage. Results suggest a strong correlation between the objective and subjective evaluation. Also, non-backward compatible compression algorithms appear to perform better at lower output bit rates than backward compatible algorithms across the settings used in this evaluation

    Psychometric scaling of TID2013 dataset

    Get PDF
    TID2013 is a subjective image quality assessment dataset with a wide range of distortion types and over 3000 images. The dataset has proven to be a challenging test for objective quality metrics. The dataset mean opinion scores were obtained by collecting pairwise comparison judgments using the Swiss tournament system, and averaging votes of observers. However, this approach differs from the usual analysis of multiple pairwise comparisons, which involves psychometric scaling of the comparison data using either Thurstone or Bradley-Terry models. In this paper we investigate how quality scores change when they are computed using such psychometric scaling instead of averaging vote counts. In order to properly scale TID2013 quality scores, we conduct four additional experiments of two different types, which we found necessary to produce a common quality scale: comparisons with reference images, and cross-content comparisons. We demonstrate on a fifth validation experiment that the two additional types of comparisons are necessary and in conjunction with psychometric scaling improve the consistency of quality scores, especially across images depicting different contents

    BitNet: Learning-Based Bit-Depth Expansion

    Full text link
    Bit-depth is the number of bits for each color channel of a pixel in an image. Although many modern displays support unprecedented higher bit-depth to show more realistic and natural colors with a high dynamic range, most media sources are still in bit-depth of 8 or lower. Since insufficient bit-depth may generate annoying false contours or lose detailed visual appearance, bit-depth expansion (BDE) from low bit-depth (LBD) images to high bit-depth (HBD) images becomes more and more important. In this paper, we adopt a learning-based approach for BDE and propose a novel CNN-based bit-depth expansion network (BitNet) that can effectively remove false contours and restore visual details at the same time. We have carefully designed our BitNet based on an encoder-decoder architecture with dilated convolutions and a novel multi-scale feature integration. We have performed various experiments with four different datasets including MIT-Adobe FiveK, Kodak, ESPL v2, and TESTIMAGES, and our proposed BitNet has achieved state-of-the-art performance in terms of PSNR and SSIM among other existing BDE methods and famous CNN-based image processing networks. Unlike previous methods that separately process each color channel, we treat all RGB channels at once and have greatly improved color restoration. In addition, our network has shown the fastest computational speed in near real-time.Comment: Accepted by ACCV 2018, Authors Byun and Shim contributed equall
    • …
    corecore